Snowflake Architecture

 Snowflake is a true self-managed service, meaning:

  • There is no hardware (virtual or physical) to select, install, configure, or manage.
  • There is virtually no software to install, configure, or manage.
  • Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake.

Snowflake runs completely on cloud infrastructure. All components of Snowflake’s service (other than optional command line clients, drivers, and connectors), run in public cloud infrastructures.Snowflake uses virtual compute instances for its compute needs and a storage service for persistent storage of data. Snowflake cannot be run on private cloud infrastructures (on-premises or hosted).Snowflake is not a packaged software offering that can be installed by a user. Snowflake manages all aspects of software installation and updates.


Database Storage

When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.

Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.

Query Processing

Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.

Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.

Cloud Services

The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.

Services managed in this layer include:

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control

Connecting to Snowflake

Snowflake supports multiple ways of connecting to the service:

A web-based user interface from which all aspects of managing and using Snowflake can be accessed.

Command line clients (e.g. SnowSQL) which can also access all aspects of managing and using Snowflake. ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect to Snowflake.

Native connectors (e.g. Python, Spark) that can be used to develop applications for connecting to Snowflake.

Third-party connectors that can be used to connect applications such as ETL tools (e.g. Informatica) and BI tools (e.g. ThoughtSpot) to Snowflake.

No comments:

Post a Comment